Segmental conditional random fields with deep neural networks as acoustic models for first-pass word recognition

نویسندگان

  • Yanzhang He
  • Eric Fosler-Lussier
چکیده

Discriminative segmental models, such as segmental conditional random fields (SCRFs), have been successfully applied to speech recognition recently in lattice rescoring to integrate detectors across different levels of units, such as phones and words. However, the lattice generation has been constrained by a baseline decoder, typically a frame-based hybrid HMMDNN system, which still suffers from the well-known frame independent assumption. In this paper, we propose to use SCRFs with DNNs directly as the acoustic model, a one-pass unified framework that can utilize local phone classifiers, phone transitions and long-span features, in direct word decoding to model phones or sub-phonetic segments with variable length. We describe a WFST-based approach to utilize the proposed acoustic model efficiently with the language model in first-pass word recognition. Our evaluation on the WSJ0 corpus shows our SCRF-DNN system outperforms a hybrid HMM-DNN system and a frame-level CRF-DNN system using the same monophone label space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segmental Recurrent Neural Networks for End-to-End Speech Recognition

We study the segmental recurrent neural network for end-to-end acoustic modelling. This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction. Compared to most previous CRF-based acoustic models, it does not rely on an external system to provide features or segmentation boundaries. Instead, this model marginalises out all t...

متن کامل

Deep segmental neural networks for speech recognition

Hybrid systems which integrate the deep neural network (DNN) and hidden Markov model (HMM) have recently achieved remarkable performance in many large vocabulary speech recognition tasks. These systems, however, remain to rely on the HMM and assume the acoustic scores for the (windowed) frames are independent given the state, suffering from the same difficulty as in the previous GMM-HMM systems...

متن کامل

Acoustic Modeling Based on Deep Conditional Random Fields

Acoustic modeling based on Hidden Markov Models (HMMs) is employed by state-of-theart stochastic speech recognition systems. In continuous density HMMs, the state scores are computed using Gaussian mixture models. On the other hand, Deep Neural Networks (DNN) can be used to compute the HMM state scores. This leads to significant improvement in the recognition accuracy. Conditional Random Fields...

متن کامل

Recent Improvements on Error Detection for Automatic Speech Recognition

Automatic speech recognition(ASR) offers the ability to access the semantic content present in spoken language within audio and video documents. While acoustic models based on deep neural networks have recently significantly improved the performances of ASR systems, automatic transcriptions still contain errors. Errors perturb the exploitation of these ASR outputs by introducing noise to the te...

متن کامل

Using word confusion networks for slot filling in spoken language understanding

Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU) because of automatic speech recognition (ASR) errors. To improve the performance of slot filling, a successful approach is to use a statistical model that is trained on ASR one-best hypotheses. The state of the art models for slot filling rely on using discriminative sequence modeling methods, s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015